Project 2#
For this project, I wanted to work with data from my country because, in the medium term, I hope to return and work there, ideally for the government. I believe it’s beneficial for my learning process to apply the concepts learned in this course to data I may use in the future.
I chose higher education enrollment as my focus.
The data was downloaded from the following link: https://datosabiertos.mineduc.cl/matricula-por-estudiante-2/
The file name in Spanish is “Matrícula por estudiante 2024,” which translates to “Enrollment per Student 2024.”
I downloaded data for 2019 through 2024 to analyze the evolution of enrollment over time.
#Import pandas library and csv module
import pandas as pd
import csv
import plotly.express as px
#I downloaded the data to my computer. In this step, I read the CSV and loaded a data frame so I could work with it.
enrollment24 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2024.csv", sep=';' )
enrollment23 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2023.csv", sep=';' )
enrollment22 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2022.csv", sep=';' )
enrollment21 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2021.csv", sep=';' )
enrollment20 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2020.csv", sep=';' )
enrollment19 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2019.csv", sep=';' )
latlon = pd.read_csv('/Users/josefinadesolminihac/Desktop/Intro to Infographics/Final Exercise/Latitud - Longitud Chile.csv', sep=',' )
/var/folders/hk/7xwmy2rx1wndbwgft1hwjm3h0000gn/T/ipykernel_20302/1441070996.py:2: DtypeWarning: Columns (49) have mixed types. Specify dtype option on import or set low_memory=False.
enrollment24 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2024.csv", sep=';' )
/var/folders/hk/7xwmy2rx1wndbwgft1hwjm3h0000gn/T/ipykernel_20302/1441070996.py:3: DtypeWarning: Columns (49,50) have mixed types. Specify dtype option on import or set low_memory=False.
enrollment23 = pd.read_csv("/Users/josefinadesolminihac/Desktop/computing/Project2/DataenrollmentChile/2023.csv", sep=';' )
# I want to see how the data looks like. I did this step with all the years to see if they had the same columns.
enrollment24.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1385814 entries, 0 to 1385813
Data columns (total 52 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 cat_periodo 1385814 non-null int64
1 id 1385814 non-null int64
2 codigo_unico 1385814 non-null object
3 mrun 1383843 non-null float64
4 gen_alu 1385814 non-null int64
5 fec_nac_alu 1385814 non-null int64
6 rango_edad 1385814 non-null object
7 anio_ing_carr_ori 1385814 non-null int64
8 sem_ing_carr_ori 1385814 non-null int64
9 anio_ing_carr_act 1385814 non-null int64
10 sem_ing_carr_act 1385814 non-null int64
11 tipo_inst_1 1385814 non-null object
12 tipo_inst_2 1385814 non-null object
13 tipo_inst_3 1385814 non-null object
14 cod_inst 1385814 non-null int64
15 nomb_inst 1385814 non-null object
16 cod_sede 1385814 non-null int64
17 nomb_sede 1385814 non-null object
18 cod_carrera 1385814 non-null int64
19 nomb_carrera 1385814 non-null object
20 modalidad 1385814 non-null object
21 jornada 1385814 non-null object
22 version 1385814 non-null int64
23 tipo_plan_carr 1385814 non-null object
24 dur_estudio_carr 1385814 non-null int64
25 dur_proceso_tit 1385814 non-null int64
26 dur_total_carr 1385814 non-null int64
27 region_sede 1385814 non-null object
28 provincia_sede 1385814 non-null object
29 comuna_sede 1385814 non-null object
30 nivel_global 1385814 non-null object
31 nivel_carrera_1 1385814 non-null object
32 nivel_carrera_2 1385814 non-null object
33 requisito_ingreso 1385814 non-null object
34 vigencia_carrera 1385814 non-null object
35 formato_valores 1385814 non-null object
36 valor_matricula 1385543 non-null float64
37 valor_arancel 1385543 non-null float64
38 codigo_demre 814250 non-null float64
39 area_conocimiento 1385814 non-null object
40 cine_f_97_area 1385814 non-null object
41 cine_f_97_subarea 1385814 non-null object
42 area_carrera_generica 1385814 non-null object
43 cine_f_13_area 1385814 non-null object
44 cine_f_13_subarea 1385814 non-null object
45 acreditada_carr 1385814 non-null object
46 acreditada_inst 1385814 non-null object
47 acre_inst_desde_hasta 1345987 non-null object
48 acre_inst_anio 1345987 non-null float64
49 costo_proceso_titulacion 1385543 non-null object
50 costo_obtencion_titulo_diploma 1385543 non-null object
51 forma_ingreso 1385814 non-null object
dtypes: float64(5), int64(15), object(32)
memory usage: 549.8+ MB
# I want to see how the data looks like
enrollment24.head()
| cat_periodo | id | codigo_unico | mrun | gen_alu | fec_nac_alu | rango_edad | anio_ing_carr_ori | sem_ing_carr_ori | anio_ing_carr_act | ... | area_carrera_generica | cine_f_13_area | cine_f_13_subarea | acreditada_carr | acreditada_inst | acre_inst_desde_hasta | acre_inst_anio | costo_proceso_titulacion | costo_obtencion_titulo_diploma | forma_ingreso | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2024 | 1251127 | I117S1C35J4V1 | 5.0 | 2 | 199105 | 30 a 34 años | 2024 | 1 | 2024 | ... | Técnico en Prevención de Riesgos | Servicios | Servicios de Higiene y Salud Ocupacional | NO ACREDITADA | ACREDITADA | 01/06/2022 AL 01/06/2026 | 4.0 | 0 | 0 | 1- Ingreso Directo (regular) |
| 1 | 2024 | 119610 | I45S2C4J1V1 | 26.0 | 2 | 200212 | 20 a 24 años | 2022 | 1 | 2022 | ... | Derecho | Administración de Empresas y Derecho | Derecho | NO ACREDITADA | ACREDITADA | 17/12/2021 AL 16/12/2027 | 6.0 | 0 | 0,7 | 1- Ingreso Directo (regular) |
| 2 | 2024 | 893498 | I31S2C41J1V1 | 35.0 | 1 | 199711 | 25 a 29 años | 2017 | 1 | 2017 | ... | Enfermería | Salud y Bienestar | Salud | NO ACREDITADA | ACREDITADA | 29/10/2019 AL 29/10/2024 | 5.0 | 258445 | 70000 | 1- Ingreso Directo (regular) |
| 3 | 2024 | 2004738 | I70S1C900J1V1 | 43.0 | 2 | 199007 | 30 a 34 años | 2020 | 1 | 2020 | ... | Doctorado en Ciencias Sociales | Ciencias Sociales, Periodismo e Información | Ciencias Sociales y del Comportamiento | ACREDITADA | ACREDITADA | 22/12/2018 AL 22/12/2025 | 7.0 | 0 | 112000 | 1- Ingreso Directo (regular) |
| 4 | 2024 | 812831 | I143S27C34J2V1 | 51.0 | 2 | 198305 | 40 y más años | 2022 | 1 | 2022 | ... | Técnico en Enfermería | Salud y Bienestar | Salud | NO ACREDITADA | ACREDITADA | 23/11/2022 AL 23/11/2027 | 5.0 | 858000 | 0 | 1- Ingreso Directo (regular) |
5 rows × 52 columns
# Because they have the same columns I wanted to have all the information in one big dataframe
dfs = [enrollment24, enrollment23, enrollment22, enrollment21, enrollment20, enrollment19]
enrollment19_24 = pd.concat(dfs)
enrollment19_24
| cat_periodo | id | codigo_unico | mrun | gen_alu | fec_nac_alu | rango_edad | anio_ing_carr_ori | sem_ing_carr_ori | anio_ing_carr_act | ... | area_carrera_generica | cine_f_13_area | cine_f_13_subarea | acreditada_carr | acreditada_inst | acre_inst_desde_hasta | acre_inst_anio | costo_proceso_titulacion | costo_obtencion_titulo_diploma | forma_ingreso | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2024 | 1251127 | I117S1C35J4V1 | 5.0 | 2 | 199105 | 30 a 34 años | 2024 | 1 | 2024 | ... | Técnico en Prevención de Riesgos | Servicios | Servicios de Higiene y Salud Ocupacional | NO ACREDITADA | ACREDITADA | 01/06/2022 AL 01/06/2026 | 4.0 | 0 | 0 | 1- Ingreso Directo (regular) |
| 1 | 2024 | 119610 | I45S2C4J1V1 | 26.0 | 2 | 200212 | 20 a 24 años | 2022 | 1 | 2022 | ... | Derecho | Administración de Empresas y Derecho | Derecho | NO ACREDITADA | ACREDITADA | 17/12/2021 AL 16/12/2027 | 6.0 | 0 | 0,7 | 1- Ingreso Directo (regular) |
| 2 | 2024 | 893498 | I31S2C41J1V1 | 35.0 | 1 | 199711 | 25 a 29 años | 2017 | 1 | 2017 | ... | Enfermería | Salud y Bienestar | Salud | NO ACREDITADA | ACREDITADA | 29/10/2019 AL 29/10/2024 | 5.0 | 258445 | 70000 | 1- Ingreso Directo (regular) |
| 3 | 2024 | 2004738 | I70S1C900J1V1 | 43.0 | 2 | 199007 | 30 a 34 años | 2020 | 1 | 2020 | ... | Doctorado en Ciencias Sociales | Ciencias Sociales, Periodismo e Información | Ciencias Sociales y del Comportamiento | ACREDITADA | ACREDITADA | 22/12/2018 AL 22/12/2025 | 7.0 | 0 | 112000 | 1- Ingreso Directo (regular) |
| 4 | 2024 | 812831 | I143S27C34J2V1 | 51.0 | 2 | 198305 | 40 y más años | 2022 | 1 | 2022 | ... | Técnico en Enfermería | Salud y Bienestar | Salud | NO ACREDITADA | ACREDITADA | 23/11/2022 AL 23/11/2027 | 5.0 | 858000 | 0 | 1- Ingreso Directo (regular) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1268478 | 2019 | 2041045 | I70S1C760J2V1 | NaN | 1 | 199008 | 25 a 29 años | 2017 | 1 | 2017 | ... | Magister en Administración y Comercio | Administración de Empresas y Derecho | Educación Comercial y Administración | ACREDITADA | ACREDITADA | 22/12/2018 AL 22/12/2025 | 7.0 | NaN | NaN | 6- Ingreso especial para estudiantes extranjeros |
| 1268479 | 2019 | 2055836 | I86S1C924J1V1 | NaN | 2 | 198805 | 30 a 34 años | 2019 | 1 | 2019 | ... | Postítulo en Ciencias Sociales | Ciencias Sociales, Periodismo e Información | Periodismo e Información | NO ACREDITADA | ACREDITADA | 01/12/2018 AL 01/12/2025 | 7.0 | NaN | NaN | 6- Ingreso especial para estudiantes extranjeros |
| 1268480 | 2019 | 2014078 | I20S1C47J4V1 | NaN | 1 | 197502 | 40 y más años | 2018 | 1 | 2018 | ... | Magister en Educación | Educación | Educación | NO ACREDITADA | ACREDITADA | 24/12/2017 AL 24/12/2021 | 4.0 | NaN | NaN | 6- Ingreso especial para estudiantes extranjeros |
| 1268481 | 2019 | 2038820 | I70S1C520J1V1 | NaN | 1 | 199402 | 25 a 29 años | 2019 | 1 | 2019 | ... | Doctorado en Ciencias Básicas | Ciencias naturales, matemáticas y estadística | Ciencias Físicas | ACREDITADA | ACREDITADA | 22/12/2018 AL 22/12/2025 | 7.0 | NaN | NaN | 6- Ingreso especial para estudiantes extranjeros |
| 1268482 | 2019 | 628229 | I155S2C39J2V1 | NaN | 2 | 200008 | 15 a 19 años | 2019 | 1 | 2019 | ... | Administración de Empresas e Ing. Asociadas | Administración de Empresas y Derecho | Educación Comercial y Administración | NO ACREDITADA | NO ACREDITADA | NaN | NaN | NaN | NaN | 1- Ingreso Directo (regular) |
7815459 rows × 52 columns
#Because the names of the columns are in Spanish, and I'm not planning to use all columns, I will create a new data frame with the columns that I'm planning to use and change the names to English.
enrollment19_24.columns
Index(['cat_periodo', 'id', 'codigo_unico', 'mrun', 'gen_alu', 'fec_nac_alu',
'rango_edad', 'anio_ing_carr_ori', 'sem_ing_carr_ori',
'anio_ing_carr_act', 'sem_ing_carr_act', 'tipo_inst_1', 'tipo_inst_2',
'tipo_inst_3', 'cod_inst', 'nomb_inst', 'cod_sede', 'nomb_sede',
'cod_carrera', 'nomb_carrera', 'modalidad', 'jornada', 'version',
'tipo_plan_carr', 'dur_estudio_carr', 'dur_proceso_tit',
'dur_total_carr', 'region_sede', 'provincia_sede', 'comuna_sede',
'nivel_global', 'nivel_carrera_1', 'nivel_carrera_2',
'requisito_ingreso', 'vigencia_carrera', 'formato_valores',
'valor_matricula', 'valor_arancel', 'codigo_demre', 'area_conocimiento',
'cine_f_97_area', 'cine_f_97_subarea', 'area_carrera_generica',
'cine_f_13_area', 'cine_f_13_subarea', 'acreditada_carr',
'acreditada_inst', 'acre_inst_desde_hasta', 'acre_inst_anio',
'costo_proceso_titulacion', 'costo_obtencion_titulo_diploma',
'forma_ingreso'],
dtype='object')
#Choose the columns that I will use
filetered_data = enrollment19_24[['cat_periodo', 'codigo_unico', 'gen_alu', 'fec_nac_alu',
'rango_edad','anio_ing_carr_act', 'tipo_inst_1', 'tipo_inst_2',
'tipo_inst_3', 'cod_inst', 'nomb_inst', 'cod_sede', 'nomb_sede',
'cod_carrera', 'nomb_carrera', 'region_sede', 'provincia_sede', 'comuna_sede',
'valor_matricula', 'valor_arancel', 'area_conocimiento', 'acreditada_carr',
'acreditada_inst', 'acre_inst_desde_hasta', 'acre_inst_anio',
'costo_proceso_titulacion', 'costo_obtencion_titulo_diploma']]
filetered_data.columns
Index(['cat_periodo', 'codigo_unico', 'gen_alu', 'fec_nac_alu', 'rango_edad',
'anio_ing_carr_act', 'tipo_inst_1', 'tipo_inst_2', 'tipo_inst_3',
'cod_inst', 'nomb_inst', 'cod_sede', 'nomb_sede', 'cod_carrera',
'nomb_carrera', 'region_sede', 'provincia_sede', 'comuna_sede',
'valor_matricula', 'valor_arancel', 'area_conocimiento',
'acreditada_carr', 'acreditada_inst', 'acre_inst_desde_hasta',
'acre_inst_anio', 'costo_proceso_titulacion',
'costo_obtencion_titulo_diploma'],
dtype='object')
#Rename the columns so they are in English and more descriptive.
# Define a dictionary with current column names as keys and new names as values
column_renames = {
'cat_periodo': 'Year',
'codigo_unico': 'Unique Code',
'gen_alu': 'Student Gender',
'fec_nac_alu': 'Birth Date',
'rango_edad': 'Age Range',
'anio_ing_carr_act': 'Year Entered Current Program',
'tipo_inst_1': 'Institution Type_1',
'tipo_inst_2': 'Institution Type_2',
'tipo_inst_3': 'Institution Type_3',
'cod_inst': 'Institution Code',
'nomb_inst': 'Institution Name',
'cod_sede': 'Campus Code',
'nomb_sede': 'Campus Name',
'cod_carrera': 'Program Code',
'nomb_carrera': 'Program Name',
'region_sede': 'Campus Region',
'provincia_sede': 'Campus Province',
'comuna_sede': 'Campus Commune',
'valor_matricula': 'Tuition Value',
'valor_arancel': 'Fee Value',
'area_conocimiento': 'Knowledge Area',
'costo_proceso_titulacion': 'Graduation Process Cost',
'costo_obtencion_titulo_diploma': 'Diploma_Cost'
}
updated_data = filetered_data.rename(columns=column_renames)
updated_data.columns
Index(['Year', 'Unique Code', 'Student Gender', 'Birth Date', 'Age Range',
'Year Entered Current Program', 'Institution Type_1',
'Institution Type_2', 'Institution Type_3', 'Institution Code',
'Institution Name', 'Campus Code', 'Campus Name', 'Program Code',
'Program Name', 'Campus Region', 'Campus Province', 'Campus Commune',
'Tuition Value', 'Fee Value', 'Knowledge Area', 'acreditada_carr',
'acreditada_inst', 'acre_inst_desde_hasta', 'acre_inst_anio',
'Graduation Process Cost', 'Diploma_Cost'],
dtype='object')
#Now I want to see if I need to change the type of certain columns. For example, year, birth date, and year entered current program
updated_data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 7815459 entries, 0 to 1268482
Data columns (total 27 columns):
# Column Dtype
--- ------ -----
0 Year int64
1 Unique Code object
2 Student Gender int64
3 Birth Date int64
4 Age Range object
5 Year Entered Current Program int64
6 Institution Type_1 object
7 Institution Type_2 object
8 Institution Type_3 object
9 Institution Code int64
10 Institution Name object
11 Campus Code int64
12 Campus Name object
13 Program Code int64
14 Program Name object
15 Campus Region object
16 Campus Province object
17 Campus Commune object
18 Tuition Value float64
19 Fee Value float64
20 Knowledge Area object
21 acreditada_carr object
22 acreditada_inst object
23 acre_inst_desde_hasta object
24 acre_inst_anio float64
25 Graduation Process Cost object
26 Diploma_Cost object
dtypes: float64(3), int64(7), object(17)
memory usage: 1.6+ GB
#I tried changing the type int64 on the dates to date type but had problems with the visualizations.
#updated_data["Year"] = pd.to_datetime(updated_data["Year"], format="%Y")
#updated_data["Year Entered Current Program"] = pd.to_datetime(updated_data["Year Entered Current Program"], format="%Y")
#updated_data.info()
#I changed the gender type because it appeared as an integer and was categorical.
updated_data["Student Gender"]=updated_data["Student Gender"].astype(str)
updated_data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 7815459 entries, 0 to 1268482
Data columns (total 27 columns):
# Column Dtype
--- ------ -----
0 Year int64
1 Unique Code object
2 Student Gender object
3 Birth Date int64
4 Age Range object
5 Year Entered Current Program int64
6 Institution Type_1 object
7 Institution Type_2 object
8 Institution Type_3 object
9 Institution Code int64
10 Institution Name object
11 Campus Code int64
12 Campus Name object
13 Program Code int64
14 Program Name object
15 Campus Region object
16 Campus Province object
17 Campus Commune object
18 Tuition Value float64
19 Fee Value float64
20 Knowledge Area object
21 acreditada_carr object
22 acreditada_inst object
23 acre_inst_desde_hasta object
24 acre_inst_anio float64
25 Graduation Process Cost object
26 Diploma_Cost object
dtypes: float64(3), int64(6), object(18)
memory usage: 1.6+ GB
#Only kept the columns that I'm planning to use
onlyyears_data = updated_data[[ 'Year', 'Unique Code', 'Student Gender','Institution Name', 'Campus Region', 'Institution Type_1']]
onlyyears_data.head()
| Year | Unique Code | Student Gender | Institution Name | Campus Region | Institution Type_1 | |
|---|---|---|---|---|---|---|
| 0 | 2024 | I117S1C35J4V1 | 2 | IP IACC | Metropolitana | Institutos Profesionales |
| 1 | 2024 | I45S2C4J1V1 | 2 | UNIVERSIDAD DEL DESARROLLO | Metropolitana | Universidades |
| 2 | 2024 | I31S2C41J1V1 | 1 | UNIVERSIDAD AUTONOMA DE CHILE | Maule | Universidades |
| 3 | 2024 | I70S1C900J1V1 | 2 | UNIVERSIDAD DE CHILE | Metropolitana | Universidades |
| 4 | 2024 | I143S27C34J2V1 | 2 | IP AIEP | Metropolitana | Institutos Profesionales |
#For the first visualization, I wanted to see how enrollment has changed over time. So, I grouped them by year, counting the number of students enrolled.
all_years = onlyyears_data.groupby('Year')['Unique Code'].size().to_frame(name='Total Students Enrolled').reset_index()
all_years
| Year | Total Students Enrolled | |
|---|---|---|
| 0 | 2019 | 1268483 |
| 1 | 2020 | 1220898 |
| 2 | 2021 | 1294699 |
| 3 | 2022 | 1304132 |
| 4 | 2023 | 1341433 |
| 5 | 2024 | 1385814 |
#For this first figure I choose a line graph to show the change over time of enrollment.
fig1 = px.line(
all_years,
x='Year',
y='Total Students Enrolled',
title='Total Students Enrolled Over Time')
fig1
First, I’d like to explain the general structure of the academic year in Chile. In Chile, the academic year starts in March and ends in December. While some students begin their studies mid-semester (similar to the J-term at SIPA), the majority follow the standard process: they take the admissions test in December and start their academic year in March of the following year.
In this graph, we observe an increasing trend in student enrollment, except for 2020. This exception is understandable, given that 2020 was the year of the COVID-19 pandemic. Since the pandemic began in Chile around mid-March, the same month the academic year starts, it makes sense that many students chose not to enroll in university. However, aside from this anomaly, the data clearly shows a steady upward trend in enrollment over time.
#I want to create a second visualization now to see if there is a difference in female and male enrolment and the evolution through the years.
gender = onlyyears_data.groupby(["Year", "Student Gender"])["Unique Code"].size().reset_index(name="Count")
gender["Student Gender"] = gender["Student Gender"].map({"1": "Men", "2": "Women"})
gender
| Year | Student Gender | Count | |
|---|---|---|---|
| 0 | 2019 | Men | 595637 |
| 1 | 2019 | Women | 672846 |
| 2 | 2020 | Men | 568913 |
| 3 | 2020 | Women | 651985 |
| 4 | 2021 | Men | 593534 |
| 5 | 2021 | Women | 701165 |
| 6 | 2022 | Men | 602296 |
| 7 | 2022 | Women | 701836 |
| 8 | 2023 | Men | 623543 |
| 9 | 2023 | Women | 717890 |
| 10 | 2024 | Men | 647133 |
| 11 | 2024 | Women | 738681 |
fig2 = px.line(
gender,
x='Year',
y='Count',
color='Student Gender',
title='Total Students Enrolled Over Time by Gender',
labels={'year': 'Year', 'Count': 'Total Students Enrolled', 'Student Gender':'Student Gender'})
fig2
We can see that both males and females follow the same trend. As has been observed for several years, more women enroll in higher education institutions than men.
#I want to create a second visualization now to see if there is a difference in female and male enrolment and the evolution through the years.
typeofinstitution = onlyyears_data.groupby(["Year", "Institution Type_1"])["Unique Code"].size().reset_index(name="Count")
typeofinstitution["Institution Type_1"] = typeofinstitution["Institution Type_1"].map({"Institutos Profesionales": "Professional Institutes", "Universidades": "Universities", "Centros de Formación Técnica": "Technical Training Centers"})
typeofinstitution
| Year | Institution Type_1 | Count | |
|---|---|---|---|
| 0 | 2019 | Technical Training Centers | 137928 |
| 1 | 2019 | Professional Institutes | 381412 |
| 2 | 2019 | Universities | 749143 |
| 3 | 2020 | Technical Training Centers | 130346 |
| 4 | 2020 | Professional Institutes | 362030 |
| 5 | 2020 | Universities | 728522 |
| 6 | 2021 | Technical Training Centers | 134475 |
| 7 | 2021 | Professional Institutes | 379838 |
| 8 | 2021 | Universities | 780386 |
| 9 | 2022 | Technical Training Centers | 131739 |
| 10 | 2022 | Professional Institutes | 397705 |
| 11 | 2022 | Universities | 774688 |
| 12 | 2023 | Technical Training Centers | 136730 |
| 13 | 2023 | Professional Institutes | 419431 |
| 14 | 2023 | Universities | 785272 |
| 15 | 2024 | Technical Training Centers | 145230 |
| 16 | 2024 | Professional Institutes | 426334 |
| 17 | 2024 | Universities | 814250 |
fig3 = px.line(
typeofinstitution,
x='Year',
y='Count',
color='Institution Type_1',
title='Total Students Enrolled Over Time by Institution Type',
labels={'year': 'Year', 'Count': 'Total Students Enrolled', 'Institution Type_1':'Institution Type'})
fig3
In Chile, there are three types of institutions recognized by the Ministry of Education that can grant professional and technical degrees: Universities, Professional Institutes, and Technical Training Centers. With this visualization, I aimed to analyze changes over time by institution type. However, due to the significant difference in the number of students enrolled in each type, it is challenging to determine whether there was a substantial change during this period. What this graph clearly shows is that universities have the highest number of enrolled students, followed by professional institutes, and finally, technical training centers.
typeofinstitution1 = typeofinstitution[typeofinstitution['Institution Type_1']=='Universities']
fig4 = px.line(
typeofinstitution1,
x='Year',
y='Count',
color='Institution Type_1',
title='Total Students Enrolled Over Time in Universities',
labels={'year': 'Year', 'Count': 'Total Students Enrolled', 'Institution Type_1':'Institution Type'})
fig4
Now, I will present three separate graphs, each focusing on one of the three types of institutions. The first graph is for universities. Similar to the general trend, we observe a decrease in enrollment in 2020, followed by a recovery in 2021, another decrease in 2022, which appears unusual, and then an increase in 2023 and 2024.
typeofinstitution1 = typeofinstitution[typeofinstitution['Institution Type_1']=='Technical Training Centers']
fig5 = px.line(
typeofinstitution1,
x='Year',
y='Count',
color='Institution Type_1',
title='Total Students Enrolled Over Time in Technical Training Centers',
labels={'year': 'Year', 'Count': 'Total Students Enrolled', 'Institution Type_1':'Institution Type'})
fig5
In this second graph, we observe the evolution of enrollment in Technical Training Centers. The trend closely mirrors that of universities, with a decrease in enrollment in 2022.
typeofinstitution1 = typeofinstitution[typeofinstitution['Institution Type_1']=='Professional Institutes']
fig6 = px.line(
typeofinstitution1,
x='Year',
y='Count',
color='Institution Type_1',
title='Total Students Enrolled Over Time in Professional Institutes',
labels={'year': 'Year', 'Count': 'Total Students Enrolled', 'Institution Type_1':'Institution Type'})
fig6
In this third graph, we observe the evolution of enrollment in Professional Institutes. Unlike universities and technical training centers, this institution type shows a consistent increase in enrollment every year following 2020.
#Now I want to see more detail for year 2024
year = 2024
data_year = onlyyears_data[onlyyears_data["Year"] == year]
data_year
| Year | Unique Code | Student Gender | Institution Name | Campus Region | Institution Type_1 | |
|---|---|---|---|---|---|---|
| 0 | 2024 | I117S1C35J4V1 | 2 | IP IACC | Metropolitana | Institutos Profesionales |
| 1 | 2024 | I45S2C4J1V1 | 2 | UNIVERSIDAD DEL DESARROLLO | Metropolitana | Universidades |
| 2 | 2024 | I31S2C41J1V1 | 1 | UNIVERSIDAD AUTONOMA DE CHILE | Maule | Universidades |
| 3 | 2024 | I70S1C900J1V1 | 2 | UNIVERSIDAD DE CHILE | Metropolitana | Universidades |
| 4 | 2024 | I143S27C34J2V1 | 2 | IP AIEP | Metropolitana | Institutos Profesionales |
| ... | ... | ... | ... | ... | ... | ... |
| 1385809 | 2024 | I88S1C227J1V1 | 1 | UNIVERSIDAD TECNICA FEDERICO SANTA MARIA | Valparaíso | Universidades |
| 1385810 | 2024 | I116S36C51J1V1 | 2 | IP SANTO TOMAS | Ñuble | Institutos Profesionales |
| 1385811 | 2024 | I86S1C1446J4V1 | 1 | PONTIFICIA UNIVERSIDAD CATOLICA DE CHILE | Metropolitana | Universidades |
| 1385812 | 2024 | I312S1C116J4V1 | 1 | CFT CENCO | Metropolitana | Centros de Formación Técnica |
| 1385813 | 2024 | I143S2C251J1V1 | 2 | IP AIEP | Antofagasta | Institutos Profesionales |
1385814 rows × 6 columns
data_sorted = data_year.groupby(["Institution Name", "Institution Type_1"]).size().reset_index(name="Students Enrolled")
# Sort the data by 'Students Enrolled' in descending order
data_sorted2 = data_sorted.sort_values(by="Students Enrolled", ascending=False)
data_sorted2.head(20)
| Institution Name | Institution Type_1 | Students Enrolled | |
|---|---|---|---|
| 50 | IP DUOC UC | Institutos Profesionales | 105295 |
| 41 | IP AIEP | Institutos Profesionales | 87763 |
| 77 | UNIVERSIDAD ANDRES BELLO | Universidades | 64226 |
| 59 | IP INACAP | Institutos Profesionales | 57042 |
| 71 | PONTIFICIA UNIVERSIDAD CATOLICA DE CHILE | Universidades | 54939 |
| 122 | UNIVERSIDAD SAN SEBASTIAN | Universidades | 49509 |
| 25 | CFT INACAP | Centros de Formación Técnica | 46795 |
| 95 | UNIVERSIDAD DE CHILE | Universidades | 45324 |
| 64 | IP LATINOAMERICANO DE COMERCIO EXTERIOR - IPLACEX | Institutos Profesionales | 41248 |
| 38 | CFT SANTO TOMAS | Centros de Formación Técnica | 38729 |
| 80 | UNIVERSIDAD AUTONOMA DE CHILE | Universidades | 32279 |
| 58 | IP IACC | Institutos Profesionales | 32095 |
| 99 | UNIVERSIDAD DE LAS AMERICAS | Universidades | 31708 |
| 96 | UNIVERSIDAD DE CONCEPCION | Universidades | 28491 |
| 70 | IP SANTO TOMAS | Institutos Profesionales | 27468 |
| 123 | UNIVERSIDAD SANTO TOMAS | Universidades | 26929 |
| 105 | UNIVERSIDAD DE SANTIAGO DE CHILE | Universidades | 24725 |
| 119 | UNIVERSIDAD MAYOR | Universidades | 24527 |
| 100 | UNIVERSIDAD DE LOS ANDES | Universidades | 23316 |
| 125 | UNIVERSIDAD TECNICA FEDERICO SANTA MARIA | Universidades | 22652 |
fig7 = px.bar(data_sorted2,
x="Institution Name",
y="Students Enrolled",
title='Number of students enrolled in each Higher Education Institution in 2024')
fig7.show()
This bar graph shows the institutions ordered by the number of students enrolled. In the top five, we see representation from all three types of institutions. Notably, IP DUOC UC, a Professional Institute, stands out as the institution with the highest number of enrolled students for 2024.
#Now I want to see only the top 10
max_num = 31349
top_10 = data_sorted2["Students Enrolled"] >= max_num
top_10df = data_sorted2[top_10]
top_10df
| Institution Name | Institution Type_1 | Students Enrolled | |
|---|---|---|---|
| 50 | IP DUOC UC | Institutos Profesionales | 105295 |
| 41 | IP AIEP | Institutos Profesionales | 87763 |
| 77 | UNIVERSIDAD ANDRES BELLO | Universidades | 64226 |
| 59 | IP INACAP | Institutos Profesionales | 57042 |
| 71 | PONTIFICIA UNIVERSIDAD CATOLICA DE CHILE | Universidades | 54939 |
| 122 | UNIVERSIDAD SAN SEBASTIAN | Universidades | 49509 |
| 25 | CFT INACAP | Centros de Formación Técnica | 46795 |
| 95 | UNIVERSIDAD DE CHILE | Universidades | 45324 |
| 64 | IP LATINOAMERICANO DE COMERCIO EXTERIOR - IPLACEX | Institutos Profesionales | 41248 |
| 38 | CFT SANTO TOMAS | Centros de Formación Técnica | 38729 |
| 80 | UNIVERSIDAD AUTONOMA DE CHILE | Universidades | 32279 |
| 58 | IP IACC | Institutos Profesionales | 32095 |
| 99 | UNIVERSIDAD DE LAS AMERICAS | Universidades | 31708 |
fig7_24 = px.bar(top_10df,
x="Institution Name",
y="Students Enrolled",
title='Number of students enrolled in each Higher Education Institution in 2024')
year = 2023
data_year = onlyyears_data[onlyyears_data["Year"] == year]
data_sorted = data_year.groupby(["Institution Name", "Institution Type_1"]).size().reset_index(name="Students Enrolled")
data_sorted2 = data_sorted.sort_values(by="Students Enrolled", ascending=False)
data_sorted2.head(10)
| Institution Name | Institution Type_1 | Students Enrolled | |
|---|---|---|---|
| 51 | IP DUOC UC | Institutos Profesionales | 100780 |
| 42 | IP AIEP | Institutos Profesionales | 93451 |
| 79 | UNIVERSIDAD ANDRES BELLO | Universidades | 59075 |
| 60 | IP INACAP | Institutos Profesionales | 51795 |
| 124 | UNIVERSIDAD SAN SEBASTIAN | Universidades | 48175 |
| 97 | UNIVERSIDAD DE CHILE | Universidades | 45248 |
| 73 | PONTIFICIA UNIVERSIDAD CATOLICA DE CHILE | Universidades | 43250 |
| 24 | CFT INACAP | Centros de Formación Técnica | 41946 |
| 39 | CFT SANTO TOMAS | Centros de Formación Técnica | 38762 |
| 59 | IP IACC | Institutos Profesionales | 35863 |
max_num = 35863
top_10 = data_sorted2["Students Enrolled"] >= max_num
top_10df = data_sorted2[top_10]
fig7_23 = px.bar(top_10df,
x="Institution Name",
y="Students Enrolled",
title='Number of students enrolled in each Higher Education Institution in 2023')
year = 2022
data_year = onlyyears_data[onlyyears_data["Year"] == year]
data_sorted = data_year.groupby(["Institution Name", "Institution Type_1"]).size().reset_index(name="Students Enrolled")
data_sorted2 = data_sorted.sort_values(by="Students Enrolled", ascending=False)
data_sorted2.head(10)
| Institution Name | Institution Type_1 | Students Enrolled | |
|---|---|---|---|
| 56 | IP DUOC UC | Institutos Profesionales | 97256 |
| 48 | IP AIEP | Institutos Profesionales | 94774 |
| 84 | UNIVERSIDAD ANDRES BELLO | Universidades | 57645 |
| 65 | IP INACAP | Institutos Profesionales | 46809 |
| 129 | UNIVERSIDAD SAN SEBASTIAN | Universidades | 44191 |
| 78 | PONTIFICIA UNIVERSIDAD CATOLICA DE CHILE | Universidades | 42912 |
| 102 | UNIVERSIDAD DE CHILE | Universidades | 42904 |
| 30 | CFT INACAP | Centros de Formación Técnica | 41103 |
| 45 | CFT SANTO TOMAS | Centros de Formación Técnica | 37562 |
| 87 | UNIVERSIDAD AUTONOMA DE CHILE | Universidades | 31349 |
max_num = 31349
top_10 = data_sorted2["Students Enrolled"] >= max_num
top_10df = data_sorted2[top_10]
fig7_22 = px.bar(top_10df,
x="Institution Name",
y="Students Enrolled",
title='Number of students enrolled in each Higher Education Institution in 2022')
fig7_24
fig7_23
fig7_22
With these three graphs, I aimed to analyze whether the top 10 institutions, measured by enrollment numbers, have changed over the past three years. The top five institutions have remained consistent during this period, with IP DUOC UC maintaining the top position, followed by IP AIEP, Andrés Bello University, IP INACAP, and San Sebastián University. However, in the lower half of the top 10, we observe more variation over time.